Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Semi‑supervised end‑to‑end fake speech detection method based on time‑domain waveforms
FANG Xin, HUANG Zexin, ZHANG Yuhan, GAO Tian, PAN Jia, FU Zhonghua, GAO Jianqing, LIU Junhua, ZOU Liang
Journal of Computer Applications    2023, 43 (1): 227-231.   DOI: 10.11772/j.issn.1001-9081.2021101845
Abstract443)   HTML11)    PDF (6257KB)(314)       Save
The fake speech produced by modern speech synthesis and timbre conversion systems poses a serious threat to the automatic speaker recognition system. Most of the existing fake speech detection systems perform well for the known attack types in the training process, but degrades significantly in detecting unknown attack types in practical applications. Therefore, combined with the recently proposed Dual?Path Res2Net (DP?Res2Net), a semi?supervised end?to?end fake speech detection method based on time?domain waveforms was proposed. Firstly, semi?supervised learning was adopted for domain transfer to reduce the difference of data distribution between training set and test set. Then, for feature engineering, time-domain sampling points were input into DP?Res2Net directly, which increased the local multi?scale information and made full use of the dependence between audio segments. Finally, the embedded tensors were obtained to judge fake speech from natural speech after the input features going through the shallow convolution module, feature fusion module and global average pooling module. The performance of the proposed method was evaluated on the publicly available ASVspoof 2021 Speech Deep Fake evaluation set as well as the dataset VCC (Voice Conversion Challenge). Experimental results show that the Equal Error Rate (EER) of the proposed method is 19.97%, which is 10.8% less than that of the official optimal baseline system, verifying that the semi?supervised end?to?end fake speech detection method based on time?domain waveforms is effective when recognizing unknown attacks and has higher generalization capability.
Reference | Related Articles | Metrics